Tos: A text organizing system

نویسنده

  • Kemal Koymen
چکیده

This paper reports research undertaken to conceptualize, design and implement a system for automatic indexing, classification and repositing of text items, which may be any aggregates of information in English language on a computer readable media, in a standard format. The ultimate goal of the research reported here is to devise all automatic processes which would read text items, and then index, classify and reposit them for subsequent search and retrieval. Only portions of the path to this goal have been made fully automatic. These portions consist of all automatic processes as follows: 1. Scanning the text items and assigning candidate index terms (words or phrases) to the items. 2. Discriminating and rejecting candidate index terms determined to be ineffective in forming a classification automatically. 3. Generating a classification system and repositing the text items in accordance with this system. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-75-01. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/719 TOS: A TEXT ORGANIZING SYSTEM K e m a l Koymen Moore School of Electrical Engineering,& University of Pennsylvania, Philadelphia, Pennsylvania 19174. SUMMARY This paper reports research undertaken t o conceptualize, design and implement a system for automatic indexing, classification and repositing of text items, which m y be any aggregates of infomation in English language on a colnputer readable media, i n a standard format. The u l t k t e goal of the research reported here is t o devise a l l a u t m t i c processes which wuld read text items, and then index, classify and reposit them for subsequent search ard retrieval. Only portions of the path t o this goal have been made ful ly autorrratic. These portions consist of all automatic processes as follows: 1. Scanning the text i t e m s and assigning candidate index terms (words or phrases) t o the i t e m s . 2. Discriminating and rejecting candidate index terms d e t d e d t o be ineffective i n forming a classification automtically. 3. Generating a classification system and repositing the text items i n accordance with this system. To complete the process, some degree of user involvement, on an interactive basis, is incorporated i n the system, particularly for *The author is currently an assistant professor at the D e p a r b m t of Mathematics and Stat is t ics , and, Computer Science, American University, Washington, D.C. 20016. The reported research was supported under contract N0014-67-A-0216-0007 fmm the Informtion Systems Pru>gram, Office of Naval Research. discriminating the index terms which do not contribute t o a satisfactory classification. Based on various reports derived autamatically, the user can guide the system t o systematically search fo r terms which are not helpful fo r and even b m p e r the subsequent c lass i f ica t ion and information re t r ieval , u n t i l the performance of the system is judged t o be adequate. The specific achievements of the reported research are stated below, 1. System interactiveness 2. Autamatic index phrase recognition 3. Swrmary report, informing the user of the impact of user elected decisions t o delete terms on a mass basis and advising him of percentages of reduction in index t e r m vocabulary s ize o r average nuonber of index terms per item r e s u l t i w from such mss tm deletions. 4. Affinity dictionary, giving the user the a b i l i t y t o locate synonymous o r near synonymous

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-Organizing-Map-Based Metamodeling for Massive Text Data Exploration

In this study, we describe the use of the self-organizing map (SOM) as a metamodeling technique to design a parallel text data exploration system. Firstly, the large textual collections are divided into various small data subsets. Based on the different subsets, different unitary SOM models, i.e., base models, are then trained for word clustering map. In this phase, different SOM models are imp...

متن کامل

SOMLib: A Distributed Digital Library System based on Self-Organizing Maps

We describe an architecture for a distributed digital library system based on an unsupervised neural network model, namely the Self-Organizing Map. The system allows the clustering of text documents forming the basis for intelligent information retrieval. User prooles can be combined with full text queries or sample texts to locate documents within the library system. Contrary to conventional a...

متن کامل

Reflections about Symbolic vs. Iconic Representations in TUIs

Currently designed Tangible User Interfaces (TUIs) propose both iconic and symbolic tangible objects (TO). Since iconic TOs should enable to interact more naturally like in the real world and, hypothetically, require less learning time than symbolic TOs, some questions arise: Why do symbolic TOs exist? When to use iconic or symbolic representation in TOs? This paper discusses these questions an...

متن کامل

Automatic rule generation for linguistic features analysis using inductive learning technique: linguistic features analysis in TOS drive TTS system

The linguistic features analysis for input text plays an important role in achieving natural prosodic control in text-to-speech (TTS) systems. In a conventional scheme, experts refine suspicious if-then rules and change the tree structure manually to obtain correct analysis results when input texts that have been analyzed incorrectly. However, altering the tree structure drastically is difficul...

متن کامل

Automating XML markup of text documents

We present a novel system for automatically marking up text documents into XML and discuss the benefits of XML markup for intelligent information retrieval. The system uses the Self-Organizing Map (SOM) algorithm to arrange XML marked-up documents on a twodimensional map so that similar documents appear closer to each other. It then employs an inductive learning algorithm C5 to automatically ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 11  شماره 

صفحات  -

تاریخ انتشار 1975